| Show Detail |
Timezone: America/Phoenix
|
Filter Rooms:
FRI 6 MAR
7:30 a.m.
(ends 5:00 PM)
8 a.m.
(ends 2:00 PM)
8:30 a.m.
Workshop:
(ends 12:00 PM)
9:30 a.m.
1 p.m.
Tutorial:
(ends 5:00 PM)
Workshop:
(ends 5:00 PM)
Workshop:
(ends 5:00 PM)
3 p.m.
(ends 3:45 PM)
SAT 7 MAR
8:30 a.m.
Workshop:
(ends 5:00 PM)
Workshop:
(ends 12:00 PM)
9:30 a.m.
1 p.m.
Workshop:
(ends 5:00 PM)
3 p.m.
(ends 3:45 PM)
SUN 8 MAR
8 a.m.
(ends 5:00 PM)
8:30 a.m.
9 a.m.
(ends 4:00 PM)
10 a.m.
10:15 a.m.
Orals 10:15-11:15
[10:15]
DreamAnywhere: Object-Centric Panoramic 3D Scene Generation
[10:27]
ViSTA: Visual Storytelling using Multi-modal Adapters for Text-to-Image Diffusion Models
[10:39]
Odo: Depth-Guided Diffusion for Identity-Preserving Body Reshaping
[10:51]
BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis
[11:03]
Reinforcement Learning-based Adaptive Control of Classifier-Free Guidance and Timestep Embeddings in Diffusion Models
(ends 11:15 AM)
Orals 10:15-11:15
[10:15]
TS-PCI: Point Cloud Frame Interpolation with Time-Aware Point Cloud Sampling and Self-Supervised Learning Strategy
[10:27]
Enhanced Back-Projection of Vision Features for 3D Symmetry Detection
[10:39]
OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting
[10:51]
UnderWater SLAM with Laser-light sectioning method using ST-GAT
[11:03]
Leveraging Pretrained Representations for Cross-Modal Point Cloud Completion
(ends 11:15 AM)
11:15 a.m.
Posters 11:15-1:00
MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data
EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation
Overcoming Fine-Grained Visual Challenges in Animal Re-Identification via Semantic Feature Alignment
(ends 1:00 PM)
(ends 5:45 PM)
noon
12:30 p.m.
Panel Discussion:
(ends 1:30 PM)
1:45 p.m.
Orals 1:45-2:45
[1:45]
MageBench: Bridging Large Multimodal Models to Agents
[1:57]
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
[2:09]
InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation
[2:21]
ITSELF: Attention Guided Fine-Grained Alignment for Vision–Language Retrieval
[2:33]
MarineEval: Assessing the Marine Intelligence of Vision-Language Models
(ends 2:45 PM)
Orals 1:45-2:45
[1:45]
Identity Verification from Human Scent using Channel Representation of 2D Gas Chromatography-Mass Spectrometry Data
[1:57]
milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
[2:09]
OpenCowID: Zero-Shot Visual Identification of Dairy Cows
[2:21]
QCFace: Image Quality Control for boosting Face Representation & Recognition
[2:33]
MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions
(ends 2:45 PM)
2:45 p.m.
3 p.m.
Orals 3:00-3:48
[3:00]
BrightRate: Quality Assessment for User-Generated HDR Videos
[3:12]
Reviving Unsupervised Optical Flow: Concept Reevaluation, Multi-Scale Advances and Full Open-Source Release
[3:24]
UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations
[3:36]
DRWKV: Focusing on Object Edges for Low-Light Image Enhancement
(ends 4:00 PM)
Orals 3:00-3:48
[3:00]
Layout Anything: One Transformer for Universal Room Layout Estimation
[3:12]
BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities
[3:24]
Cosine Similarity is Almost All You Need (for Prototypical-Part Models)
[3:36]
Orca: Object Recognition and Comprehension for Archiving Marine Species
(ends 4:00 PM)
4 p.m.
Posters 4:00-5:45
milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities
(ends 5:45 PM)
MON 9 MAR
8 a.m.
(ends 5:00 PM)
8:30 a.m.
Keynote:
Dorin Comaniciu
(ends 9:30 AM)
9 a.m.
(ends 4:00 PM)
9:30 a.m.
9:45 a.m.
Orals 9:45-10:33
[9:45]
Fast Vision Mamba: Pooling Spatial Dimensions for Accelerated Processing
[9:57]
Extreme Amodal Face Detection
[10:09]
ENCORE : A Neural Collapse Perspective on Out-of-Distribution Detection in Deep Neural Networks
[10:21]
Performance of Conformal Prediction in Capturing Aleatoric Uncertainty
(ends 10:45 AM)
Orals 9:45-10:45
[9:45]
Scalpel: Fine-Grained Alignment of Attention Activation Manifolds via Mixture Gaussian Bridges to Mitigate Multimodal Hallucination
[9:57]
Unified Alignment Protocol: Making Sense of the Unlabeled Data in New Domains
[10:09]
Feedback Alignment Meets Low-Rank Manifolds: A Structured Recipe for Local Learning
[10:21]
Learning from Unknown for Open-Set Test-Time Adaptation
[10:33]
Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling
(ends 10:45 AM)
10:45 a.m.
(ends 6:00 PM)
1:30 p.m.
Orals 1:30-2:18
[1:30]
CalibBEV: LiDAR-Camera Calibration via BEV Alignment
[1:42]
X-JEPA: A Novel Joint Learning Cross-Modal Predictive Alignment Framework for Remote Sensing Image Retrieval
[1:54]
SSMRadNet : A Sample-wise State-Space Framework for Efficient and Ultra-Light Radar Segmentation and Object Detection
[2:06]
Rank-based Geographical Regularization: Revisiting Contrastive Self-Supervised Learning for Multispectral Remote Sensing Imagery
(ends 2:30 PM)
Orals 1:30-2:30
[1:30]
CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization
[1:42]
DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy
[1:54]
VFace: A Training-Free Approach for Diffusion-Based Video Face Swapping
[2:06]
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
[2:18]
Fine-grained Defocus Blur Control for Generative Image Models
(ends 2:30 PM)
2:30 p.m.
2:45 p.m.
Orals 2:45-3:45
[2:45]
OMeGa: Joint Optimization of Explicit Meshes and Gaussian Splats for Robust Scene-Level Surface Reconstruction
[2:57]
Confidence Through Parallel Attention for Depth and Uncertainty Estimation in Dynamic Environments
[3:09]
BiNAR: A Bi-Modal Framework for Non-Aligned RGB-IR 3D Reconstruction via Gaussian Splatting
[3:21]
Spec-Gloss Surfels and Normal–Diffuse Priors for Relightable Glossy Objects
[3:33]
Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning
(ends 3:45 PM)
Orals 2:45-3:45
[2:45]
Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance
[2:57]
Similarity-aware Probabilistic Embeddings Modeling for Video-Text Retrieval
[3:09]
PromptGAR: Flexible Promptive Group Activity Recognition
[3:21]
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
[3:33]
Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos
(ends 3:45 PM)
3:45 p.m.
4:30 p.m.
(ends 6:15 PM)
TUE 10 MAR
8 a.m.
(ends 2:00 PM)
8:30 a.m.
Keynote:
Hilde Kühne
(ends 9:30 AM)
9 a.m.
(ends 4:00 PM)
9:30 a.m.
9:45 a.m.
Orals 9:45-10:45
[9:45]
Motion-Aware Graph Fusion NetWork for 3D Human Pose Estimation
[9:57]
UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training
[10:09]
Unsupervised Discovery of Long-Term Spatiotemporal Periodic Workflows in Human Activities
[10:21]
VAST-ReID: A Low-Light Benchmark Dataset for Person Re-Identification with Visual and Attribute-Rich Semantic Tracking
[10:33]
DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors
(ends 10:45 AM)
Orals 9:45-10:45
[9:45]
DREAM: Dynamic Prompts and GuidedMix for Efficient Continual Adaptation of Visual-Language Models
[9:57]
brat: Aligned Multi-View Embeddings for Brain MRI Analysis
[10:09]
Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score
[10:21]
Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation
[10:33]
CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering
(ends 10:45 AM)
10:45 a.m.
(ends 12:15 PM)
(ends 2:00 PM)
noon
1:30 p.m.
Orals 1:30-2:30
[1:30]
Cycle-consistent Multi-graph Matching for Self-supervised Annotation of C. Elegans
[1:42]
Automated Suturing Skill Assessment in Robot-assisted Surgery from Endoscopic Videos using Clinically-guided Evaluation Criteria
[1:54]
Deep Image Decomposition for Medical Imaging Anonymization and Curation
[2:06]
Intraoperative 2D/3D Registration via Spherical Similarity Learning and Differentiable Levenberg-Marquardt Optimization
[2:18]
ACuRE: Accurate Continuity-Regularized SpO2 Estimation Using Liquid Time-Constant Networks
(ends 2:30 PM)
Orals 1:30-2:30
[1:30]
CAST: Evaluating Multi-Object Trackers with Context-Aware Switch and Transfer Scores
[1:42]
Advancing Player Identification and Tracking with Global ID Fusion (GIF)
[1:54]
Distilling What and Why: Enhancing Driver Intention Prediction with MLLMs
[2:06]
LASER: Lip Landmark Assisted Speaker Detection for Robustness
[2:18]
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
(ends 2:30 PM)
2:45 p.m.
Orals 2:45-3:45
[2:45]
SCAdapter: Content-Style Disentanglement for Diffusion Style Transfer
[2:57]
T2LF: LLM-Guided Multimodal Diffusion for Text-to-Light Field Synthesis
[3:09]
VideoSketcher: A Training-Free Approach for Coherent Video Sketch Transfer
[3:21]
Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
[3:33]
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
(ends 3:45 PM)
Orals 2:45-3:33
[2:45]
IPTQ-ViT: Post-Training Quantization of Non-linear Functions for Integer-only Vision Transformers
[2:57]
MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data
[3:09]
Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training
[3:21]
Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients
(ends 3:45 PM)
3:45 p.m.
Posters 3:45-5:30
Enhancing Reverse Distillation with Core Exemplar Learning for Unified Multi-Class Anomaly Detection
DuPLUS: Dual-Prompt Vision-Language Framework for Universal Medical Image Segmentation and Prognosis
(ends 5:30 PM)
Successful Page Load