DOTGraph: CLIP-Driven Feature Disentanglement and Optimal Transport based Graph Learning for Few-Shot Segmentation
Abstract
Few-shot semantic segmentation aims to build robust models that segment unseen objects using only a few labeled examples. Existing FSS approaches, which rely on semantic feature matching, often suffer from Background Bias, Pose-Scale Discrepancy Bias, and the inability to capture fine object details. These limitations hinder their ability to generalize to novel categories, especially in scenarios with high intra-class variability and fine-grained object structures. To overcome these challenges, we propose DOT-Graph, a novel framework that designs CLIP-driven feature Disentanglement and Optimal Transport-based Graph learning for robust few-shot segmentation. We evaluate DOTGraph on PASCAL-5𝑖 and COCO-20𝑖, achieving state-of-the-art performance with improvements in various few-shot settings. Our results demonstrate that DOTGraph effectively mitigates background bias, improves feature alignment, and enhances fine-grained segmentation. The code will be released soon.