Track: Oral Session 1B: 3D Computer Vision I

Sun 8 March 10:15 - 10:27 PDT

TS-PCI: Point Cloud Frame Interpolation with Time-Aware Point Cloud Sampling and Self-Supervised Learning Strategy

Kohei Matsuzaki ⋅ Keisuke Nonaka

Recent point cloud frame interpolation methods predict an interpolated frame through the merging of two intermediate frames constructed by scene flow estimation. However, generation errors may accumulate in the scene flow estimation errors since they adopt a generative approach to merge the frames, degrading the interpolation performance. In this paper, we propose a point cloud frame interpolation method with time-aware point cloud sampling and a self-supervised learning strategy, termed TS-PCI. The proposed method introduces a time-aware learning-based point cloud sampling model to merge the two frames into a single frame in a non-generative approach. The proposed method also introduces an attention-based geometry refinement model to improve the geometric quality of the sampled point clouds. Furthermore, the proposed method adopts a self-supervised strategy that dynamically creates ground truth labels for point cloud sampling, allowing the models to be trained in an end-to-end manner. Experimental results on three large-scale datasets show that the proposed method achieves superior performance compared to state-of-the-art methods.

Sun 8 March 10:27 - 10:39 PDT

Enhanced Back-Projection of Vision Features for 3D Symmetry Detection

Isaac Aguirre ⋅ Ivan Sipiran

We propose two algorithms for 3D symmetry detection based on enhanced back-projection of vision features extracted from foundation vision models such as DINOv2. Our method enhances back-projection by rendering multiple views of 3D objects, extracting features, and projecting them onto the geometry with two key improvements—Fibonacci view sampling and view rotations—that increase robustness and accuracy. Using these features, we detect symmetry planes and axes through two dedicated algorithms. Experiments on ShapeNet show that our plane detection approach outperforms both traditional geometric and learning-based methods by a wide margin. The method is also efficient, running in seconds on a single 8GB GPU, making it practical for large-scale or real-world applications. Overall, our results demonstrate that enhanced back-projection of vision features offers a simple yet effective framework for solving fundamental 3D geometric problems such as symmetry detection.

Sun 8 March 10:39 - 10:51 PDT

OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting

Atakan Topaloğlu ⋅ Kunyi Li ⋅ Michael Niemeyer ⋅ Nassir Navab ⋅ Ahmet Tekalp ⋅ Federico Tombari

Sparse-view novel view synthesis is fundamentally ill-posed due to severe geometric ambiguity. Current methods are caught in a trade-off: regressive models are geometrically faithful but incomplete, whereas generative models can complete scenes but often introduce structural inconsistencies. We propose OracleGS, a novel framework that reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting. Instead of using generative models to patch incomplete reconstructions, our "propose-and-validate" framework first leverages a pre-trained 3D-aware diffusion model to synthesize novel views to propose a complete scene. We then repurpose a multi-view stereo (MVS) model as a 3D-aware oracle to validate the 3D uncertainties of generated views, using its attention maps to reveal regions where the generated views are well-supported by multi-view evidence versus where they fall into regions of high uncertainty due to occlusion, lack of texture, or direct inconsistency. This uncertainty signal directly guides the optimization of a 3D Gaussian Splatting model via an uncertainty-weighted loss. Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions, outperforming state-of-the-art methods on datasets including Mip-NeRF 360 and NeRF Synthetic.

Sun 8 March 10:51 - 11:03 PDT

UnderWater SLAM with Laser-light sectioning method using ST-GAT

Heyang Gao ⋅ Kazuto Ichimaru ⋅ Takafumi Iwaguchi ⋅ Hiroshi Kawasaki

Multi-line laser ID assignment is crucial for underwater 3D reconstruction but fails when lines fragment. We reformulate this as a graph-based sequence labeling task and propose a novel two-stage hierarchical framework using Spatio-Temporal Graph Attention Networks (ST-GAT). Our method first reasons over a spatio-temporal graph of laser endpoints and intersections to handle local fragmentation, then elevates this to a global segment-level optimization with trajectory-constrained Viterbi decoding to ensure temporal consistency. This GNN-based approach eliminates the reliance on complete epipolar geometry. Experiments on real underwater datasets demonstrate superior reconstruction completeness and temporal stability, especially in challenging environments where traditional methods fail.

Sun 8 March 11:03 - 11:15 PDT

Leveraging Pretrained Representations for Cross-Modal Point Cloud Completion

Kshitij Kale ⋅ Hrishikesh U ⋅ V Sreenidhe ⋅ Shylaja S

The utility of 3D point clouds in critical applications like robotics is often hindered by their inherent incompleteness, a result of real-world occlusions and limited sensor viewpoints. To overcome this, image-guided 3D point cloud completion aims to reconstruct complete shapes by leveraging a corresponding 2D image. However, current methods typically train a cross-modal network from scratch, often failing to capture the high-level semantic context and complex structural information required for robust reconstruction. This paper challenges that paradigm by demonstrating that preexisting knowledge from large-scale, pretrained vision models can be effectively leveraged to guide the completion process. We introduce a novel Dual Branch Image Encoder, a dedicated module designed to extract and fuse rich semantic priors from a pretrained Vision Transformer with geometric depth cues. This fused representation provides a powerful, multifaceted guide that is integrated into EGIInet, a state-of-the-art point cloud completion network. Our experiments show that by conditioning the completion on these strong, pretrained priors, our method outperforms existing state-of-the-art techniques by 7\% without changing the rest of the architecture, producing more semantically coherent and structurally accurate 3D shapes.