Skip to yearly menu bar Skip to main content


Show Detail
Timezone: America/Phoenix
 
Filter Rooms:  

SUN 8 MAR
8 a.m.
(ends 5:00 PM)
8:30 a.m.
Remarks:
(ends 9:00 AM)
9 a.m.
Keynote:
Ravi Ramamoorthi
(ends 10:00 AM)
10 a.m.
Break:
(ends 10:15 AM)
10:15 a.m.
Orals 10:15-11:15
[10:15] DreamAnywhere: Object-Centric Panoramic 3D Scene Generation
[10:27] ViSTA: Visual Storytelling using Multi-modal Adapters for Text-to-Image Diffusion Models
[10:39] Odo: Depth-Guided Diffusion for Identity-Preserving Body Reshaping
[10:51] BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis
[11:03] Reinforcement Learning-based Adaptive Control of Classifier-Free Guidance and Timestep Embeddings in Diffusion Models
(ends 11:15 AM)
Orals 10:15-11:15
[10:15] TS-PCI: Point Cloud Frame Interpolation with Time-Aware Point Cloud Sampling and Self-Supervised Learning Strategy
[10:27] Enhanced Back-Projection of Vision Features for 3D Symmetry Detection
[10:39] OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting
[10:51] UnderWater SLAM with Laser-light sectioning method using ST-GAT
[11:03] Leveraging Pretrained Representations for Cross-Modal Point Cloud Completion
(ends 11:15 AM)
11:15 a.m.
Posters 11:15-1:00
(ends 1:00 PM)
noon
Break:
(ends 1:30 PM)
1:45 p.m.
Orals 1:45-2:45
[1:45] MageBench: Bridging Large Multimodal Models to Agents
[1:57] You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
[2:09] InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation
[2:21] ITSELF: Attention Guided Fine-Grained Alignment for Vision–Language Retrieval
[2:33] MarineEval: Assessing the Marine Intelligence of Vision-Language Models
(ends 2:45 PM)
Orals 1:45-2:45
[1:45] Identity Verification from Human Scent using Channel Representation of 2D Gas Chromatography-Mass Spectrometry Data
[1:57] milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
[2:09] OpenCowID: Zero-Shot Visual Identification of Dairy Cows
[2:21] QCFace: Image Quality Control for boosting Face Representation & Recognition
[2:33] MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions
(ends 2:45 PM)
2:45 p.m.
Break:
(ends 3:00 PM)
3 p.m.
Orals 3:00-3:48
[3:00] BrightRate: Quality Assessment for User-Generated HDR Videos
[3:12] Reviving Unsupervised Optical Flow: Concept Reevaluation, Multi-Scale Advances and Full Open-Source Release
[3:24] UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations
[3:36] DRWKV: Focusing on Object Edges for Low-Light Image Enhancement
(ends 4:00 PM)
Orals 3:00-3:48
[3:00] Layout Anything: One Transformer for Universal Room Layout Estimation
[3:12] BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities
[3:24] Cosine Similarity is Almost All You Need (for Prototypical-Part Models)
[3:36] Orca: Object Recognition and Comprehension for Archiving Marine Species
(ends 4:00 PM)
4 p.m.
Posters 4:00-5:45
(ends 5:45 PM)

MON 9 MAR
8 a.m.
(ends 5:00 PM)
8:30 a.m.
9 a.m.
9:30 a.m.
Break:
(ends 9:45 AM)
9:45 a.m.
Orals 9:45-10:33
[9:45] Fast Vision Mamba: Pooling Spatial Dimensions for Accelerated Processing
[9:57] Extreme Amodal Face Detection
[10:09] ENCORE : A Neural Collapse Perspective on Out-of-Distribution Detection in Deep Neural Networks
[10:21] Performance of Conformal Prediction in Capturing Aleatoric Uncertainty
(ends 10:45 AM)
Orals 9:45-10:45
[9:45] Scalpel: Fine-Grained Alignment of Attention Activation Manifolds via Mixture Gaussian Bridges to Mitigate Multimodal Hallucination
[9:57] Unified Alignment Protocol: Making Sense of the Unlabeled Data in New Domains
[10:09] Feedback Alignment Meets Low-Rank Manifolds: A Structured Recipe for Local Learning
[10:21] Learning from Unknown for Open-Set Test-Time Adaptation
[10:33] Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling
(ends 10:45 AM)
10:45 a.m.
Posters 10:45-12:30
(ends 12:30 PM)
noon
Doctoral Consortium:
(ends 2:00 PM)
Break:
(ends 1:30 PM)
1:30 p.m.
Orals 1:30-2:18
[1:30] CalibBEV: LiDAR-Camera Calibration via BEV Alignment
[1:42] X-JEPA: A Novel Joint Learning Cross-Modal Predictive Alignment Framework for Remote Sensing Image Retrieval
[1:54] SSMRadNet : A Sample-wise State-Space Framework for Efficient and Ultra-Light Radar Segmentation and Object Detection
[2:06] Rank-based Geographical Regularization: Revisiting Contrastive Self-Supervised Learning for Multispectral Remote Sensing Imagery
(ends 2:30 PM)
Orals 1:30-2:30
[1:30] CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization
[1:42] DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy
[1:54] VFace: A Training-Free Approach for Diffusion-Based Video Face Swapping
[2:06] VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
[2:18] Fine-grained Defocus Blur Control for Generative Image Models
(ends 2:30 PM)
2:30 p.m.
Break:
(ends 2:45 PM)
2:45 p.m.
Orals 2:45-3:45
[2:45] OMeGa: Joint Optimization of Explicit Meshes and Gaussian Splats for Robust Scene-Level Surface Reconstruction
[2:57] Confidence Through Parallel Attention for Depth and Uncertainty Estimation in Dynamic Environments
[3:09] BiNAR: A Bi-Modal Framework for Non-Aligned RGB-IR 3D Reconstruction via Gaussian Splatting
[3:21] Spec-Gloss Surfels and Normal–Diffuse Priors for Relightable Glossy Objects
[3:33] Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning
(ends 3:45 PM)
Orals 2:45-3:45
[2:45] Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance
[2:57] Similarity-aware Probabilistic Embeddings Modeling for Video-Text Retrieval
[3:09] PromptGAR: Flexible Promptive Group Activity Recognition
[3:21] Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
[3:33] Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos
(ends 3:45 PM)
3:45 p.m.
Meeting:
(ends 4:30 PM)
4:30 p.m.
Posters 4:30-6:15
(ends 6:15 PM)

TUE 10 MAR
8 a.m.
(ends 2:00 PM)
8:30 a.m.
9 a.m.
9:30 a.m.
Break:
(ends 9:45 AM)
9:45 a.m.
Orals 9:45-10:45
[9:45] Motion-Aware Graph Fusion NetWork for 3D Human Pose Estimation
[9:57] UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training
[10:09] Unsupervised Discovery of Long-Term Spatiotemporal Periodic Workflows in Human Activities
[10:21] VAST-ReID: A Low-Light Benchmark Dataset for Person Re-Identification with Visual and Attribute-Rich Semantic Tracking
[10:33] DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors
(ends 10:45 AM)
Orals 9:45-10:45
[9:45] DREAM: Dynamic Prompts and GuidedMix for Efficient Continual Adaptation of Visual-Language Models
[9:57] brat: Aligned Multi-View Embeddings for Brain MRI Analysis
[10:09] Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score
[10:21] Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation
[10:33] CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering
(ends 10:45 AM)
10:45 a.m.
Posters 10:45-12:15
(ends 12:15 PM)
noon
Break:
(ends 1:30 PM)
1:30 p.m.
Orals 1:30-2:30
[1:30] Cycle-consistent Multi-graph Matching for Self-supervised Annotation of C. Elegans
[1:42] Automated Suturing Skill Assessment in Robot-assisted Surgery from Endoscopic Videos using Clinically-guided Evaluation Criteria
[1:54] Deep Image Decomposition for Medical Imaging Anonymization and Curation
[2:06] Intraoperative 2D/3D Registration via Spherical Similarity Learning and Differentiable Levenberg-Marquardt Optimization
[2:18] ACuRE: Accurate Continuity-Regularized SpO2 Estimation Using Liquid Time-Constant Networks
(ends 2:30 PM)
Orals 1:30-2:30
[1:30] CAST: Evaluating Multi-Object Trackers with Context-Aware Switch and Transfer Scores
[1:42] Advancing Player Identification and Tracking with Global ID Fusion (GIF)
[1:54] Distilling What and Why: Enhancing Driver Intention Prediction with MLLMs
[2:06] LASER: Lip Landmark Assisted Speaker Detection for Robustness
[2:18] VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
(ends 2:30 PM)
2:45 p.m.
Orals 2:45-3:45
[2:45] SCAdapter: Content-Style Disentanglement for Diffusion Style Transfer
[2:57] T2LF: LLM-Guided Multimodal Diffusion for Text-to-Light Field Synthesis
[3:09] VideoSketcher: A Training-Free Approach for Coherent Video Sketch Transfer
[3:21] Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
[3:33] SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
(ends 3:45 PM)
Orals 2:45-3:33
[2:45] IPTQ-ViT: Post-Training Quantization of Non-linear Functions for Integer-only Vision Transformers
[2:57] MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data
[3:09] Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training
[3:21] Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients
(ends 3:45 PM)
3:45 p.m.
Posters 3:45-5:30
(ends 5:30 PM)