Workshops
Workshop on Large Foundation Models in Biology and Biomedicine
The rapid evolution of Large Foundation Models (LFMs) has transformed the landscape of biomedical research, clinical decision-making, and healthcare innovation. From decoding complex biological interactions to assisting in diagnosis and drug discovery, LFMs have demonstrated remarkable potential across a broad spectrum of biomedical applications. However, their adaptation to this highly specialized and sensitive domain presents unique challenges ranging from data scarcity and heterogeneity to issues of interpretability, fairness, and reproducibility. The Workshop on Large Foundation Models for Biology and Biomedicine (LFMBio 2026) aims to bring together researchers, practitioners, and industry experts to advance the science and practice of applying LFMs to biomedical problems. The workshop will serve as a forum for presenting original research, fostering interdisciplinary dialogue, and exploring cutting-edge innovations in model design, multimodal integration, trustworthiness, and ethical deployment. We invite contributions that span foundational model development, performance optimization, knowledge representation, real-world clinical applications, and the societal impact of these powerful technologies.
6th Real-World Surveillance: Applications and Challenges
Computer vision methods trained on public databases demonstrate performance drift when deployed for real-world surveillance, compared to their initial results on the test set of those employed databases.In this workshop, we are interested in papers reporting their experimental results on any application of computer vision in real-world surveillance and object security, including the protection of buildings and facilities within critical infrastructure, challenges they have faced, and their mitigation strategy on topics like, but not limited to:- Object detection- Tracking- Action recognition- Scene understanding- Super-resolution- Multi-modal surveillanceFurthermore, the workshop has a special attention to legal and ethical issues of computer vision applications in real-world scenarios.We therefore also welcome papers describing their methodology and experimental results on legal matters (like GDPR, AI Act, and US Executive Order on AI) or ethical concerns (like detecting bias towards gender, race, or other characteristics and mitigating strategies).We particularly encourage submissions addressing safety, reliability, and regulatory compliance for critical infrastructure protection, as well as privacy-preserving approaches in high-security environments.The workshop also hosts a competition on robust thermal-image object detection.We have run this workshop previously four times at WACV (2022-2025) and once at ECCV (2022).
SAFE 2026 – Synthetic & Adversarial ForEnsics
The rise of generative AI and foundation models presents new challenges for ensuring robustness against synthetic and adversarial media. Research in adversarial machine learning has shown that detection systems can be bypassed with subtle perturbations, enabling malicious content that undermines societal trust and national security. This workshop offers a venue for advancing work at the intersection of synthetic media forensics and adversarial robustness, with a focus on provenance analysis, fingerprinting, authenticity verification, and resilience across diverse generative architectures. Expected outcomes include a taxonomy of joint synthetic–adversarial threats, benchmark resources for evaluation, and stronger collaboration between technical, forensics, and policy communities.
Pixels to Patients: Bridging CV State-of-Art with Clinical Impact
Clinical medicine represents one of the most demanding testbeds for computer vision, where methods must function reliably under distribution shifts, strict regulatory constraints, and high social impact. Despite rapid progress in foundation models, multimodal learning, and self-supervision, a persistent gap remains between state-of-the-art CV research and its safe, effective deployment in clinical practice. This workshop, 'Pixels to Patients: Bridging CV State-of-the-Art with Clinical Impact', will bring together leading researchers, clinicians, and industry partners to address this gap head-on. Core themes include adapting recent CV breakthroughs -foundation and vision-language models, domain adaptation, and data-efficient learning - to healthcare, and exporting lessons from clinical deployment - bias auditing, monitoring, and regulatory compliance - back to the broader CV community. The program will feature keynotes from pioneers in medical AI, oral and poster sessions, a Problem-Pitch track to seed benchmark challenges, and a panel on real-world deployment and interoperability. By positioning medicine not as a silo but as a proving ground for trustworthy, generalizable vision systems, this workshop aims to catalyze advances that resonate across safety-critical domains, from healthcare to robotics and beyond.
4th Workshop on Computer Vision for Winter Sports
The workshop invites paper submissions focusing on the analysis and interpretation of images and videos captured during winter sports and related summer activities such as mountain sports (e.g., mountaineering, downhill biking, climbing). Topics include video understanding, pose and performance analysis, injury prevention, trajectory and scene reconstruction, crowd monitoring, AR/VR for fan engagement, and dataset creation. We also welcome work addressing challenges such as harsh weather, real-time processing, multimodal fusion, and camera pose estimation in broadcast videos.
HARVEST-Vision: International Workshop on Applications of CV and HPC in Agriculture
The rapid growth of computer vision (CV) and artificial intelligence (AI) is reshaping agriculture, offering new approaches to challenges in food security, climate resilience, and sustainability. Scaling these technologies requires bridging core CV research with domain-specific data infrastructures and practical user needs. Key applications include crop and soil monitoring, pest and disease detection, yield forecasting, and resource optimization. Yet, these tasks are hindered by heterogeneous field data, limited labeled datasets—especially in edge environments—and the demand for models that are both scalable and interpretable.Recognizing these challenges, we launched the HARVEST workshop series (the first hosted at 54th International Conference on Parallel Processing in San Diego, CA, supported by the NSF ICICLE AI Institute led by the Ohio State University) to build community and cyberinfrastructure at the intersection of AI, HPC, and agriculture. The event showcased high-profile keynote speakers, and engaged around 30 participants from academia, national labs, and industry. Our agenda included technical talks, panels, and hands-on demonstrations. Through a partnership with NSF AI Institutes Virtual Organization, we were able to secure travel awards for 12 rising academic researchers, many of whom are now part of a growing expert network community of computer scientists and agricultural experts.Bringing this momentum to WACV, our HARVEST-Vision workshop directly aligns with the community’s focus on impactful real-world CV: agricultural cyberinfrastructure is an urgent, societally relevant application space for advances in robust perception, domain adaptation, scalable datasets, and explainable, deployable AI. By fostering interdisciplinary collaborations, catalyzing new datasets, benchmarks, and reproducible pipelines, and supporting diversity via travel awards, our workshop will expand WACV’s reach, accelerating knowledge transfer and enabling high-impact computer vision research for agriculture and beyond.
International Workshop on Smart Waste Monitoring (WasteVision)
The growing global concern around waste management, illegal dumping, and environmental pollution highlights the urgent need for intelligent monitoring solutions. Advances in computer vision, guided by the impressive progress in artificial intelligence technologies, offer promising opportunities to address these challenges. However, the scientific literature in this field points out that key research gaps remain, including the lack of robust detection methods for diverse environments, limited datasets and benchmarks, and the need for solutions that can be deployed in real systems with limited computational resources running in real time.The International Workshop on Smart Waste Monitoring (WasteVision) will provide a unique forum for researchers and companies to present and discuss novel contributions in this emerging field. The workshop seeks to advance the state of the art in smart waste monitoring, illegal dumping detection, and environmental pollution surveillance while fostering interdisciplinary collaboration.We invite original research contributions in (but not limited to) the following areas:- Image analysis for waste detection and classification- Video analysis for waste tracking and management- Computer vision methods for detecting illegal waste disposal- Multimodal systems for dumping identification- Remote sensing and UAV-based waste monitoring- Video and image analytics for pollution tracking- Datasets and benchmarks for waste and pollution monitoring- Applications and case studies (real-world deployments in urban and rural contexts)The workshop will also host the first edition of the Illegal Waste Dumping Detection (IWDD) contest, in which the participants will receive a novel dataset for training their approaches for illegal waste dumping detection: the methods will be evaluated on a private test set, in order to ensure a fair comparison on unpublished videos. We welcome scientific papers describing the methods and results obtained during the contest, including innovative approaches, system designs, and comparative analyses.We will accept the submissions of regular papers with more than 5 pages, whose template must follow the same formatting guidelines required by the main conference. All the accepted papers will be published in the proceedings alongside the main conference. More details will be provided on the workshop website.
EVGEN - Event-based Vision in the Era of Generative AI - Transforming Perception and Visual Innovation Summary
The rapid convergence of event-based vision and generative artificial intelligence (Gen-AI) offers unprecedented opportunities to redefine perception and visual innovation. Event cameras provide asynchronous, high-temporal-resolution data that complements traditional frame-based sensing, while generative models have revolutionized content synthesis, restoration, and reasoning across modalities. The Event-based Vision in the Era of Generative AI (EVGEN 2026) workshop will serve as a forum to explore this synergy, addressing topics such as video generation and interpolation, motion deblurring and prediction, multimodal sensor fusion, gesture reconstruction, and applications in autonomous systems. By fostering dialogue among researchers from neuromorphic vision, computer vision, robotics, and AI, the workshop aims to inspire novel methods, highlight emerging applications, and chart new research directions. EVGEN 2026 will feature invited talks, poster sessions, lightning talks, and a panel discussion, bringing together leading experts and early-career researchers to shape the future of event-driven perception enhanced by generative models.
VisionDocs: 3rd Workshop on Computer Vision Systems for Document Analysis and Recognition
The rapid digitization of textual and visual information has made automated document analysis increasinglycritical across industrial, scientific, and cultural domains. Despite advances in computer vision, research largely focuses on limited document types and tasks, leaving challenges such as heterogeneous formats,low-resource languages, non-standard layouts, and historical documents largely unaddressed. This workshop aims to advance document understanding by exploring cutting-edge approaches, including generativemodeling, self-supervised learning, multimodal fusion, and few-shot adaptation. By fostering collaborationbetween computer vision researchers and domain experts, it seeks to promote solutions for generalizationacross diverse document types and low-data scenarios. Currently underrepresented at major computer visionvenues, document analysis will benefit from a dedicated forum connecting the WACV and ICDAR communities. VisionDocs will showcase state-of-the-art methods, stimulate cross-disciplinary exchange, and define new research directions, advancing both scientific understanding and practical AI applications in documentanalysis.
The Second Workshop on Computer Vision for Geospatial Image Analysis
Motivation and Impact: There is a growing need for venues that foster a richer collaboration between the computer vision and geospatial image analysis communities. By being collocated with WACV (a top-tier computer vision workshop at the cutting edge of computer vision applications), our proposed workshop will provide a platform for algorithm developers and computer vision researchers as well as researchers in the geospatial imaging and image analysis communities to come together and work on challenging problems of great social relevance}. This workshop will build on the foundations of the very successful full-day workshop https://sites.google.com/view/geocv GeoCV @WACV 2025 that was very well received (5 keynote talks, 30 posters, 20 full papers in the proceedings and a full room at the venue). For the 2026 Edition of GeoCV, we propose to emphasize self-supervised learning for the training of large vision and multi-modal foundation models, domain generalization in the context of such models and related concepts as they pertain to Geospatial Image Analysis applications. We believe that this workshop would be a complementary addition to the WACV lineup of workshops by focusing on a niche but very important and emergent area. In addition to the 2025 Edition of GeoCV, we have had a very positive experience hosting a workshop https://sites.google.com/view/morse2025 - (MORSE 2025)} that complemented other workshops and provided synergy in the https://cvpr.thecvf.com/Conferences/2025/workshop-list - {remote sensing track} at CVPR 2025.Expected Outcomes: We expect this workshop to serve as a platform for dissemination of the state-of-the-art in computer vision and AI for geospatial imaging and its applications. Additionally, by bringing together leading researchers from academia and industry to deliver talks on emerging algorithmic ideas, sensing capabilities and research directions, we aim to provide a venue for cross-fertilization of ideas to further the impact computer vision can play in this rapidly growing area of geospatial image analysis, as well as accelerate the adoption of the latest trends and promising developments in computer vision for the analysis of geospatial image analysis at scale. Our workshop represents a diverse team, spanning 5 countries across the globe. Additionally, our team and the planned speakers represent a wide range in terms of their stage-of-career, from rising stars to established researchers. This workshop will also serve as a venue for students and postdocs to network with leading researchers in these emerging and important research areas. Details are provided in the enclosed proposal PDF file, following the WACV workshop proposal template.
Synthetic Realities and Data in Biometric Analysis and Security
5th Workshop on Image/Video/Audio Quality Assessment in Computer Vision, VLM and Diffusion Model
Image, video, and audio quality significantly impacts machine learning and computer vision systems, yetremains underexplored by the broader research community. Real-world applications—from streaming ser-vices and autonomous vehicles to cashier-less stores and generative AI—critically depend on robust qualityassessment and improvement techniques. Despite their importance, most visual learning systems assumehigh-quality inputs, while in reality, artifacts from capture, compression, transmission, and rendering pro-cesses can severely degrade performance and user experience.This workshop is particularly timely given the explosive growth of generative AI, which introducesnew challenges in quality assessment for both inputs and outputs. By bringing together researchers fromindustry and academia, we aim to systematically investigate how quality issues affect various visual learningtasks and develop innovative assessment and mitigation techniques. Building on the success of our previousworkshops at WACV(2022-2025), we expect to stimulate new research directions and attract more talent tothis critical field, ultimately improving the robustness and reliability of computer vision applications acrossindustries.
LENS: Learning and Exploitation of Latent Space Geometries
LENS brings together researchers studying the geometry of latent representations their manifolds, Riemannian structures, intrinsic dimensions, and implications for model design and evaluation. We aim to bridge advances in geometric learning with practical computer vision applications, fostering dialogue between the theory and deployments.
We welcome contributions that deepen our understanding of latent spaces (e.g., curvature, geodesics, topology), propose geometry-aware architectures and objectives, or demonstrate how latent geometry can improve robustness, generalization, fairness, privacy, and efficiency in real-world vision systems.
3rd Workshop on Computer Vision for Earth Observation (CV4EO) Applications
The 3rd CV4EO is conceived as a platform to foster application-oriented, multidisciplinary interactions between the CV community and experts from geoscience domains, EO data providers, government agencies, stakeholders, and other organizations pairing CV and EO for decision-making in impactful applications such as disaster response, national security, and environmental protection. We propose a Full Day program comprising keynote talks, lightning talks & poster sessions.
Foundational Models Beyond the Visual Spectrum
The rapid rise of foundational models has transformed computer vision, but most progress has been confined to the visible spectrum. Many real-world applications in healthcare, maritime, biometrics, remote sensing, autonomous navigation, and defense rely on data modalities such as infrared, LIDAR, hyperspectral, depth, acoustic, event-cameras, RF, or radar, where foundational models remain underexplored. This workshop aims to bring together researchers working on extending and adapting foundational models beyond the visual spectrum, addressing challenges such as cross-modal pretraining, data scarcity, and domain adaptation. The motivation is to bridge the gap between visible-spectrum advances and broader multimodal sensing, which is both timely and relevant to the WACV community as it expands toward embodied AI and real-world deployment. The expected impact of the workshop is twofold: (i) to catalyze new research directions by highlighting the unique opportunities and challenges of non-visual modalities, and (ii) to foster collaborations across academia, industry, and government working in these critical areas. We anticipate outcomes including a clearer community roadmap, new benchmarks, and broader awareness of the importance of foundational models beyond the visual spectrum.
3rd Physical Retail AI Workshop
Applications of vision-based Artificial Intelligence (AI) methods are increasingly present throughout so-ciety. Fueled by recent advances in Computer Vision, Deep Learning, web-scale training of vision andlanguage models (“foundation models”), and edge compute, AI applications have expanded into a novel ar-ray of industries and products. In particular, the physical retail and grocery sectors have recently experienced an explosion of AI-enabled technologies, allowing for more efficient, effortless, and engaging experiences for shoppers, enabling the reduction of shrinkage for retailers, and providing insights on improving store efficiency, thereby reducing operational costs. Computer Vision applications are being deployed to numerous retail sectors, including small convenience stores, large grocery stores, fashion stores, and shopping carts, etc. The workshop series in WACV already attracted a community of CV researchers to attend continuously and also 90+ participation teams in GroceryVision challenges from around the world; we expect in 2026 to expand to 100+ teams. The 3rd Physical Retail AI Workshop (PRAW) in WACV 2026 introduces the novel area of Computer Vision applications to Physical Retail and a continuation of previous successful workshops in WACV 2024, CVPR 2024, WACV 2025 and ICCV 2025. The proposed workshop is expected to produce the publication of approximately 6 full-length papers included in workshop proceedings, release a novel and publicly available dataset (GroceryVision dataset) to the Computer Vision community, and collaborate to garner further attention and interest to WACV and its Workshops.
VReID-XFD: Video-based Human Recognition at Extreme Far Distances
Most existing benchmarks on UAV-based human recognition assume only moderate distances between cameras and subjects and short time-lapses between consecutive observations of subjects, enabling clothing-based cues o be used in recognition. This workshop, and its associated competition, aims at exactly opposite scenarios: extreme far distances, with resolution limited to few pixels, severe variations in pose, clothing changes, and strong environmental shifts.To address these factors, we will use the recently created DetReIDX dataset (https://www.it.ubi.pt/DetReIDX/) as anchor of the workshop. It is the first large-scale video-based benchmark for UAV-based person recognition at altitudes of up to 120 meters, and different pitch angles. It also includes daylight variations, clothing changes, and supports multiple tasks: detection, tracking and human re-identification.
WACV-2026 Workshop On Generative, Adversarial, Manipulation and Presentation Attacks In Biometrics
Newer architectures like Generative Adversarial Networks (GANs) and Diffusion models can now produce ultra-realistic content with perceptually convincing geometry, texture, and motion, challenging human perception in distinguishing synthetic from authentic content. While such realism is highly beneficial in sectors like entertainment, media, and content creation, it also poses serious threats to secure access control systems, particularly those based on biometrics. Image and video manipulation attacks have significantly evolved, leveraging both traditional image processing techniques and advanced adversarial machine learning approaches (e.g., GANs, Diffusion). One particularly insidious attack is morphing, where a single manipulated image can compromise multiple identities, making biometric authentication highly vulnerable. Similarly, DeepFakes threaten the integrity of digital information channels, potentially enabling misinformation, identity fraud, and social engineering attacks at scale.Alongside visual manipulation, Large Language Models (LLMs) introduce a new dimension of synthetic content creation. LLMs can generate highly coherent text, persuasive narratives, and even phishing content that mimics human writing, which can be used maliciously for social engineering, spreading disinformation, or automating attacks on information systems. The convergence of visual and textual generative AI thus amplifies the risk landscape, making detection and verification more challenging.These developments have a dual impact: while they advance content generation, creative applications, education, and simulation-based training, they also threaten trust in digital information, compromise biometric security, and increase vulnerability to identity and information attacks. Expected outcomes include the development of robust multimodal detection methods for visual and textual synthetic content, creation of benchmark datasets and evaluation protocols for assessing manipulation detection systems under realistic scenarios, and the enhancement of ethical, legal, and societal frameworks for the responsible deployment of generative AI.We propose to conduct a eighth workshop on WACV-2026 - Workshop On Manipulation, Generative, Adversarial, and Presentation Attacks In Biometrics. The workshop is planned to report the advancements in creation, evaluation, impact and mitigation measures for adversarial attacks (soft and hard attacks) on biometrics systems. The workshop also targets submissions addressing the analyses and mitigation measures for function creep attacks. This half-day workshop is a seventh edition of the special session, previously held in conjunction with BTAS-2018, WACV-2020, WACV-2021, WACV-2022, WACV-2023, WACV-2024 and WACV-2025 respectively.
Visual Art, Generative AI, and the Legal/Ethical Dilemma
Generative AI has transformed how visual art is created and circulated. Text-to-image generation systems such as Stable Diffusion, DALL·E, and Midjourney can instantly produce artworks inspired by centuries of human creativity. While these technologies democratize access to artistic tools, they also raise urgent questions about copyright, artistic integrity, and provenance. This workshop will bring together researchers, artists, legal scholars, and industry practitioners to critically examine the technical, legal, and societal challenges of visual art in the age of generative AI. By hosting this dialogue at WACV, we seek to bridge the computer vision community with the creative and legal domains, and to set a research agenda that safeguards artistic integrity while enabling innovation.
Workshop on Generative AI for Photography
The camera serves as the primary visual interface between the real and the visual world, and a photograph captures a visual snapshot that contains rich physical concepts (e.g., illumination, blur). Meanwhile, generative AI has made significant progress in producing high-quality multi-modal content. However, current general Generative AI systems still have limited understanding of photographic concepts, which constrains their applicability to photography and cinematography. This gap is particularly critical, as photography and cinematography are not only influential art forms but also a complex multi-modal intellectual practice.Photography is a complex multi-modal intellectual practice. Capturing a photograph requires an agent to understand the environment, recognize the scene, compose the visual layout, compute and determine the appropriate camera settings, and trigger the shutter at the right moment. Scene understanding also involves recognizing the cultural and contextual significance of potential subjects. To compose the visual layout, the agent must select the field of view (FoV), which depends on the focal length and sensor size. To determine the camera settings, agents must reason about their visual outcomes. For example, depth of field (DoF) is jointly determined by aperture, focal length, and sensor size. Exposure depends on aperture, ISO speed, and shutter time, with the latter also influencing motion blur. Thus, photography requires integrated reasoning across perception, composition, and physics-based imaging principles. The Workshop on Generative AI for Photography (GAIP) aims to bridge general-purpose Generative AI with the domain knowledge of photography and cinematography. It unites researchers in vision, graphics, natural language processing, generative modeling, computational photography, and cognitive science to tackle challenges at the AI–arts intersection. GAIP will highlight new models, datasets, evaluation protocols, and benchmarks for systems that reason about composition, exposure, DoF, and visual storytelling. We expect that GAIP will advance AI research, drive real-world applications, democratize photographic expertise, and foster collaboration across academia, industry, and the creative community, making GAIP unique among WACV 2026 workshops.
Robust and Generalized Lane Topology Understanding and HD Map Generation through CoT Design
We propose to organize a workshop on "Robust and Generalized Lane Topology Understanding and HD Map Generation through CoT Design" at WACV 2026 and propose two planning-oriented lane topology understanding and HD map generation datasets with CoT (Chain-of-Thought). This workshop will provide a platform for industry experts and academics to brainstorm and exchange ideas about road understanding CoT and its derived outstanding works to advance autonomous driving. The workshop will be organized by leading industry and academia researchers from The Chinese University of Hong Kong, Shenzhen, Tencent T-Lab, and The University of Hong Kong. Through keynote speeches, paper presentations, and discussions, we aim to foster collaboration and advance the state-of-the-art in road understanding for autonomous driving.
WACV 2026 Workshop Proposal Scene Graph for Structured Intelligence
Scene graphs provide a structured and interpretable representation of objects, attributes, and relationships in 2D, 3D, and even 4D scenes, serving as a vital bridge between raw visual data and high-level reasoning, which is critical for tasks such as visual reasoning, navigation, and embodied AI. With the rapid rise of multimodal foundation models, integrating scene graphs has become a timely and essential task, offering controllability, explainability, and stronger generalization across different domains and modalities. This workshop will highlight the latest advances in scene graph generation, representation learning, and their applications in vision–language reasoning, multimodal generation, and robotics. We aim to establish new benchmarks, foster interdisciplinary collaboration, and chart future directions toward the development of structured multimodal intelligence. By uniting researchers from computer vision, NLP, and robotics, the workshop will stimulate impactful discussions and accelerate progress toward trustworthy, general-purpose AI systems.
Large Language and Vision Models for Autonomous Driving
The 5th LLVM-AD workshop invites submissions that contribute to the progression of LLMs and VLMs within the domain of autonomous driving. We are particularly interested in bridging the gap between the rich image and language data found within the context of autonomous driving. Our primary areas of interest are: a) Traffic Scene Understanding enhanced by VLMs and b) Human-Autonomy Teaming driven by LLMs. The topics include but not limited to• Large Language Models and Vision Language Models for Autonomous Driving• Multimodal Motion Planning and Prediction• New Dataset for Autonomous Driving• Semantics and Scene Understanding in Autonomous Driving• Language-Driven Sensor and Traffic Simulation• Domain Adaptation and Transfer Learning in Autonomous Driving• Multi-Modal Fusion for Autonomous Driving• Survey and Prospective Paper for Autonomous Driving• Other Applications of Language or Vision Models for Driving