Zero‑Shot Domain Generalisation via Prompt-Driven Feature Refinement
Abstract
Domain generalisation aims to develop models that generalise from source domains to unseen target domains. However, most existing methods assume access to source domain data and require additional training, which may not always be practical. We focus on a more flexible and broadly applicable setting, zero-shot domain generalisation, where models generalise without access to source data, target data, or any additional training. In this work, we propose Prefer (prompt-driven feature refinement), a simple and effective approach which enhances the zero-shot domain generalisation ability of vision-language foundation models. Prefer generates a diverse set of textual prompts for each class by imagining domain-specific variations (e.g., "a painting of a cat under a golden sunset with thick brush strokes''), and uses them to probe the model. We evaluate how reliably each feature channel represents a class across domains by measuring two quantities: (1) how strongly the channel aligns with the original class prompt (e.g., "a photo of a cat") across the generated domain-specific prompts, and (2) how stable the channel remains across those prompts, quantified by its variance. Channels that exhibit both high alignment and low variability are selected at inference time to improve class prediction under domain shift. Without any model updates or external data, Prefer achieves consistent improvements across domain generalisation benchmarks, outperforming existing state-of-the-art methods. The source code is available at https://anonymous.4open.science/r/WACV26-Prefer.