| Show Detail |
Timezone: America/Phoenix
|
Filter Rooms:
SUN 9 MAR
8 a.m.
(ends 5:00 PM)
9:30 a.m.
2:30 p.m.
3:45 p.m.
MON 10 MAR
8 a.m.
(ends 2:00 PM)
9:30 a.m.
noon
FRI 6 MAR
8 a.m.
(ends 2:00 PM)
SAT 7 MAR
8 a.m.
(ends 2:00 PM)
SUN 8 MAR
8 a.m.
(ends 5:00 PM)
8:30 a.m.
9 a.m.
(ends 4:00 PM)
10 a.m.
10:15 a.m.
Orals -
brat: Aligned Multi-View Embeddings for Brain MRI Analysis
CalibBEV: LiDAR-Camera Calibration via BEV Alignment
LASER: Lip Landmark Assisted Speaker Detection for Robustness
Reinforcement Learning-based Adaptive Control of Classifier-Free Guidance and Timestep Embeddings in Diffusion Models
ViSTA: Visual Storytelling using Multi-modal Adapters for Text-to-Image Diffusion Models
(ends 11:15 AM)
Orals -
ENCORE : A Neural Collapse Perspective on Out-of-Distribution Detection in Deep Neural Networks
MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions
Scalpel: Fine-Grained Alignment of Attention Activation Manifolds via Mixture Gaussian Bridges to Mitigate Multimodal Hallucination
SSMRadNet : A Sample-wise State-Space Framework for Efficient and Ultra-Light Radar Segmentation and Object Detection
Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score
(ends 11:15 AM)
11:15 a.m.
Posters -
EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation
MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data
MR-Pruner: Training-free Multi-resolution Visual Token Pruning for Multi-modal Large Language Models
(ends 1:00 PM)
(ends 5:45 PM)
noon
12:30 p.m.
Panel Discussion:
(ends 1:30 PM)
1:45 p.m.
Orals -
Intraoperative 2D/3D Registration via Spherical Similarity Learning and Differentiable Levenberg-Marquardt Optimization
Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients
QCFace: Image Quality Control for boosting Face Representation & Recognition
TS-PCI: Point Cloud Frame Interpolation with Time-Aware Point Cloud Sampling and Self-Supervised Learning Strategy
Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
(ends 2:45 PM)
Orals -
Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training
Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance
Extreme Amodal Face Detection
MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
(ends 2:45 PM)
2:45 p.m.
3 p.m.
Orals -
Fast Vision Mamba: Pooling Spatial Dimensions for Accelerated Processing
MarineEval: Assessing the Marine Intelligence of Vision-Language Models
OpenCowID: Zero-Shot Visual Identification of Dairy Cows
Rank-based Geographical Regularization: Revisiting Contrastive Self-Supervised Learning for Multispectral Remote Sensing Imagery
(ends 4:00 PM)
Orals -
Advancing Player Identification and Tracking with Global ID Fusion (GIF)
DreamAnywhere: Object-Centric Panoramic 3D Scene Generation
MageBench: Bridging Large Multimodal Models to Agents
UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations
X-JEPA: A Novel Joint Learning Cross-Modal Predictive Alignment Framework for Remote Sensing Image Retrieval
(ends 4:00 PM)
4 p.m.
Posters -
BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities
milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
Non‑Contact Blood Pressure Estimation from Face Videos via Physiology‑Aware Contrastive Learning
(ends 5:45 PM)
MON 9 MAR
8:30 a.m.
Keynote:
Dorin Comaniciu
(ends 9:30 AM)
9 a.m.
(ends 4:00 PM)
9:45 a.m.
Orals -
BrightRate: Quality Assessment for User-Generated HDR Videos
Layout Anything: One Transformer for Universal Room Layout Estimation
milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
VFace: A Training-Free Approach for Diffusion-Based Video Face Swapping
(ends 10:45 AM)
Orals -
ACuRE: Accurate Continuity-Regularized SpO2 Estimation Using Liquid Time-Constant Networks
DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy
Deep Image Decomposition for Medical Imaging Anonymization and Curation
Spec-Gloss Surfels and Normal–Diffuse Priors for Relightable Glossy Objects
(ends 10:45 AM)
10:45 a.m.
(ends 1:30 PM)
(ends 6:00 PM)
1:30 p.m.
Orals -
Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos
Orca: Object Recognition and Comprehension for Archiving Marine Species
Similarity-aware Probabilistic Embeddings Modeling for Video-Text Retrieval
T2LF: LLM-Guided Multimodal Diffusion for Text-to-Light Field Synthesis
VideoSketcher: A Training-Free Approach for Coherent Video Sketch Transfer
(ends 2:30 PM)
Orals -
Confidence Through Parallel Attention for Depth and Uncertainty Estimation in Dynamic Environments
DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors
DREAM: Dynamic Prompts and GuidedMix for Efficient Continual Adaptation of Visual-Language Models
Feedback Alignment Meets Low-Rank Manifolds: A Structured Recipe for Local Learning
(ends 2:30 PM)
2:45 p.m.
Orals -
Automated Suturing Skill Assessment in Robot-assisted Surgery from Endoscopic Videos using Clinically-guided Evaluation Criteria
Cycle-consistent Multi-graph Matching for Self-supervised Annotation of C. Elegans
Enhanced Back-Projection of Vision Features for 3D Symmetry Detection
Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning
VAST-ReID: A Low-Light Benchmark Dataset for Person Re-Identification with Visual and Attribute-Rich Semantic Tracking
(ends 3:45 PM)
Orals -
BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities
Distilling What and Why: Enhancing Driver Intention Prediction with MLLMs
Fine-grained Defocus Blur Control for Generative Image Models
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
(ends 3:45 PM)
4:30 p.m.
(ends 6:15 PM)
TUE 10 MAR
8:30 a.m.
Keynote:
Hilde Kühne
(ends 9:30 AM)
9 a.m.
(ends 2:00 PM)
9:45 a.m.
Orals -
DRWKV: Focusing on Object Edges for Low-Light Image Enhancement
Learning from Unknown for Open-Set Test-Time Adaptation
OracleGS: Training-Free Sparse-View Gaussian Splatting
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training
(ends 10:45 AM)
Orals -
BiNAR: A Bi-Modal Framework for Non-Aligned RGB-IR 3D Reconstruction via Gaussian Splatting
Cosine Similarity is Almost All You Need (for Prototypical-Part Models)
Motion-Aware Graph Fusion NetWork for 3D Human Pose Estimation
Odo: Depth-Guided Diffusion for Identity-Preserving Body Reshaping
SCAdapter: Content-Style Disentanglement for Diffusion Style Transfer
(ends 10:45 AM)
10:45 a.m.
(ends 2:00 PM)
(ends 12:15 PM)
1:30 p.m.
Orals -
CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization
Performance of Conformal Prediction in Capturing Aleatoric Uncertainty
Reviving Unsupervised Optical Flow: Concept Reevaluation, Multi-Scale Advances and Full Open-Source Release
Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling
Unified Alignment Protocol: Making Sense of the Unlabeled Data in New Domains
(ends 2:45 PM)
2:45 p.m.
Orals -
CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering
IPTQ-ViT: Post-Training Quantization of Non-linear Functions for Integer-only Vision Transformers
PromptGAR: Flexible Promptive Group Activity Recognition
Unsupervised Discovery of Long-Term Spatiotemporal Periodic Workflows in Human Activities
(ends 3:45 PM)
Orals -
Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation
InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation
ITSELF: Attention Guided Fine-Grained Alignment for Vision–Language Retrieval
OMeGa: Joint Optimization of Explicit Meshes and Gaussian Splats for Robust Scene-Level Surface Reconstruction
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
(ends 3:45 PM)
Orals -
BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis
CAST: Evaluating Multi-Object Trackers with Context-Aware Switch and Transfer Scores
Identity Verification from Human Scent using Channel Representation of 2D Gas Chromatography-Mass Spectrometry Data
Leveraging Pretrained Representations for Cross-Modal Point Cloud Completion
UnderWater SLAM with Laser-light sectioning method using ST-GAT
(ends 3:45 PM)
3:45 p.m.
Posters -
Enhancing Reverse Distillation with Core Exemplar Learning for Unified Multi-Class Anomaly Detection
Exploiting Label-Independent Regularization from Spatial Dependencies for Whole Slide Image Analysis
Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering
(ends 5:30 PM)
Successful Page Load