OMeGa: Joint Optimization of Explicit Meshes and Gaussian Splats for Robust Scene-Level Surface Reconstruction
Abstract
Neural rendering algorithms for Novel View Synthesis and Scene Reconstruction tasks have recently received much attention with the advancement of 3D Gaussian Splatting. For mesh reconstruction, most existing works over-fit a Gaussian Splatting model with multi-view images and obtain the triangle mesh from the model using post-optimization extraction strategies. However, such methods exhibit following limitations: 1) Gaussian Splats often yield inaccurate geometry in indoor scene reconstructions, particularly in texture-less regions, leading to suboptimal triangle mesh quality; 2) the mesh extraction is entirely decoupled from the optimization process, neglecting the potential of using mesh geometry as constraints to guide the optimization of Gaussian Splats. To address these challenges, this paper introduces a novel end-to-end differentiable framework for both rendering and geometry reconstruction tasks. Our key contribution involves jointly optimizing 2D splats and an explicit 3D mesh representation through a flexible binding strategy during the training process. This allows our approach to effectively leverage mesh geometry constraints to guide the optimization of 2D splats while preserving sufficient flexibility, resulting in both accurate alignment with scene surfaces and expressive texture representation. Furthermore, as another core component of our method, we design an iterative mesh refinement technique, including a novel gradient-based subdivision strategy and a mesh face removal strategy, to further improve the detail and accuracy of the reconstructed mesh. Extensive experiments show that our joint-representation framework achieves overall state-of-the-art performance on challenging benchmarks, effectively addressing prior limitations associated with indoor scene reconstruction.