Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent Cues
Tuan-Anh Vu · Nguyen Hai · Ziqiang Zheng · Binh-Son Hua · Qing Guo · Ivor Tsang · Sai-Kit Yeung
Abstract
Glass is a prevalent material among solid objects in everyday life, but segmentation methods struggle to distinguish it from opaque materials due to its transparency and reflection. While it is known that human perception relies on boundary and reflective object features to tell glass objects, the existing literature has yet to sufficiently capture both properties in handling transparent objects. Hence, we propose to incorporate both of these powerful visual cues via Boundary Feature Enhancement and Reflection Feature Enhancement modules in a mutually beneficial way. Our proposed framework, $\textbf{TransCues}$, is a pyramidal transformer encoder-decoder architecture to segment transparent objects. We empirically show that these two modules can be used together effectively, improving overall performance on various benchmark datasets, including semantic segmentation of glass object datasets, mirror object datasets, and generic segmentation datasets of both. Our method outperforms the state-of-the-art by a large margin, achieving $\textbf{+4.2}$% mIoU on Trans10K-v2, $\textbf{+5.6}$% mIoU on MSD, $\textbf{+10.1}$% mIoU on RGBD-Mirror, $\textbf{+13.1}$% mIoU on TROSD, and $\textbf{+8.3}$% mIoU on Stanford2D3D, showing the effectiveness of our method against glass objects.
Successful Page Load