Being Positive about Negative Queries: Exclusion Aware Multimodal Retrieval using Disentangled Representations
Abstract
The handling of exclusion in multimodal retrieval remains an underexplored challenge with significant implications for the accuracy and reliability of information retrieval systems. Although existing approaches have advanced multimodal understanding, they typically lack mechanisms to process exclusion explicitly. To address this, we propose a novel model ExclMM that leverages disentangled representations to effectively handle exclusion in multimodal retrieval. Our approach enables precise differentiation between the presence and absence of specific elements in an image, outperforming existing methods. To rigorously evaluate our model, we construct a dataset, ExcluCOCO that pairs exclusion-based queries with ground truth images, sourced from MSCOCO. This dataset serves as a robust benchmark for assessing exclusion comprehension in multimodal contexts. By explicitly incorporating exclusion, our work advances multimodal retrieval by introducing both a model tailored for exclusion-aware retrieval and a benchmark to facilitate future research in this domain.