EndoPBR: Photorealistic Synthetic Data for Surgical 3D Vision via Physically-based Rendering
Abstract
Synthetic data has played a pivotal role in developing large-scale 3D vision models due to its high-quality annotations and ease of curation. In domains where labeled data collection is difficult, synthetic data holds promise as a means to generate the large‑scale annotated datasets required to train modern neural networks. As a result, the ability to generate photorealistic synthetic data with 3D labels would be immensely helpful for domains like endoscopy, where conventional 3D reconstruction algorithms struggle and labeled data is scarce. In this work, we address a core question for data-scarce applications in 3D vision: how can we generate synthetic labeled data, and how useful would it be for training downstream vision models? To this end, we first introduce a novel data generation module that takes images with known geometry and camera poses as input and estimates the material and lighting conditions of the scene. To disambiguate the training process, we leverage domain-specific properties like non-stationary lighting and anatomical material priors. We model the material properties as a bidirectional reflectance distribution function, parameterized by a neural network. Via the rendering equation, we can generate photorealistic images at arbitrary camera poses. We demonstrate that this method produces competitive novel view synthesis results compared to previous work. Secondly, we use our synthetic data to train models on various downstream 3D vision tasks and find that models trained solely on our synthetic data outperform those trained on real data across all metrics and tasks. Our experiments show that synthetic data is a promising avenue towards robust 3D vision solutions in surgical scenes.