BiNAR: A Bi-Modal Framework for Non-Aligned RGB-IR 3D Reconstruction via Gaussian Splatting
Abstract
Existing RGB-IR (infrared) bi-modal 3D reconstruction methods generally have difficulty in simultaneously processing non-aligned multi-modal data with significant differences in resolution and spectral characteristics and achieving high-precision pixel-level reconstruction. Non-aligned RGB-IR 3D reconstruction and rendering represents a new domain. To this end, we propose BiNAR, a bi-modal framework that can directly process non-aligned data collected by conventional RGB and IR cameras and generate high-resolution, pixel-level aligned renderings. BiNAR first uses cross-modal multi-camera joint calibration to accurately estimate the internal and external parameters of the RGB-IR camera and unify the coordinate system; then, it fuses the features of different modalities in the Unified Gaussian Field and jointly optimizes the Gaussians to achieve cross-modal consistent 3D scene expression. Experimental results show that BiNAR significantly outperforms traditional single-modal and bi-modal Gaussian splatting methods in rendering quality, achieving a sub-pixel average reprojection error of 0.242 px and improves IR PSNR by 12.22 dB. We also build a pixel-level aligned RGB-IR dataset covering a variety of indoor and outdoor scenes and including real temperature data, providing a reliable benchmark for subsequent multi-modal research. The code and dataset will be available.