DREAM: Dynamic Prompts and GuidedMix for Efficient Continual Adaptation of Visual-Language Models
Abstract
Vision-language models (VLMs) exhibit impressive zero-shot transfer, but remain static and cannot adapt when exposed to new tasks. Meanwhile, conventional continual learning often overlooks preserving this zero-shot capability during adaptation. In this work, we present DREAM, a parameter-efficient framework that enables continual adaptation of VLMs while minimizing forgetting and preserving zero-shot performance. DREAM employs a dynamic prompt system with lightweight, task-specific parameters managed by two modules: a prompt composition module that dynamically generates prompts to adapt the VLM, and a query-key module that uses learned token weights to reliably activate the appropriate parameters at inference. To enhance robustness, we propose GuidedMix, which creates semantically meaningful mixed images, and pair them with mixture‑aware text embeddings to strengthen representation learning through image-text alignment. We further leverage the GuidedMix samples to estimate task-specific query-key similarity thresholds that identify samples of unseen tasks and and prevent spurious prompt usage on the VLM, thereby safeguarding its zero-shot behavior. Experiments show that our method adapts efficiently, mitigates forgetting, and maintains strong zero-shot transfer with substantially fewer trainable parameters, showing consistent gains even under partial supervision.