MDUNet: Multimodal Decoding UNet for Passive Occluder-Aided Non-line-of-sight 3D Imaging
Fadlullah Raji · John Murray-Bruce
Abstract
An ordinary digital camera typically captures an image of a directly visible scene by measuring•the light intensity (and color) arriving at each pixel of its image sensor from each small patch making up the scene. Conventional photography considers the measured light to be solely informative of the directly visible scene. However, recent research efforts have shown that subtle variations in the measured light intensity can also enable the imaging of scenes outside the direct line of sight. These methods exploit preexisting obstructions, which cast barely perceptible, but highly informative, soft shadows onto the observed planar surface. Whereas most prior works assume that exploitable occluders are partly or wholly known, or almost planar, a recent work blended a trained diffusion-based sampling to reconstruct the hidden occluding structures in 3D jointly with a transverse 2D radiosity map of all other hidden non-occluding structures. This work proposes a fully-trained novel \textbf{multipath decoding UNet} (MDUNet) architecture, wherein the multimodal, multipath decoder parallels recent physics-based methods whose successes come from explicitly separating the representations and reconstructions of the occluding and non-occluding hidden scene structures. However, by sharing a latent feature representation among them, MDUNet still tightly couples the occluding and non-occluding reconstruction pathways. As such, MDUNet improves inference time over $\mathbf{100\times}$ over a state-of-the-art diffusion-based method, and by $\mathbf{1000}\bm{\times}$ over traditional optimization-based methods, while also improving reconstruction quality. In addition, MDUNet is trained solely in simulation, but generalizes to real experimental data, while maintaining accuracy and stability even as ambient illumination increases.
Successful Page Load