CORA: Consistency-Guided Semi-Supervised Framework for Reasoning Segmentation
Prantik Howlader · Hoang Nguyen-Canh · Srijan Das · Jingyi Xu · Hieu Le · Dimitris Samaras
Abstract
Reasoning segmentation is a powerful tool, yet its generalization remains limited due to the high cost of acquiring diverse, high-quality visual and linguistic supervision. In this work, we present CORA, a semi-supervised framework that jointly learns from limited labeled data and a large corpus of unlabeled images. To improve supervision from limited labeled data, CORA introduces conditional visual instructions that encode spatial and contextual relationships between objects. To utilize unlabeled data, we propose a VLM-guided output consistency strategy that filters noisy pseudo-labels based on the stability of predictions across queries that are semantically equivalent. Additionally, we enforce token-level contrastive alignment between labeled and pseudo-labeled samples to enhance feature consistency. Together, these components enable CORA to perform robust reasoning segmentation with minimal supervision, outperforming existing baselines under constrained annotation settings. Our method achieves state-of-the-art results, requiring as few as 100 labeled images on Cityscapes, a benchmark dataset for urban scene understanding, surpassing the baseline by $+2.3\%$. Similarly, our approach improves performance by $+2.4\%$ with only 180 labeled images on PanNuke, a histopathology dataset.
Successful Page Load