Any Detector Can Detect Anything
Abstract
Visual prompt-based detection enables generalization to arbitrary novel instances in the target image by using one or a few visual templates.Previous methods rely on complex relation or explicit feature matching modules, and their designs are deeply coupled with specific detectors, greatly limiting their applicability.Instead, we propose our `Any Detector can Detect Anything' framework that can enable any detector to detect any object given a single or a few visual templates.Specifically, we design an adapter called Template-Aware Adapter that can be added on top of any existing detector architecture to inject visual template information directly into the detection features.After integration, localization is done on the feature maps as in standard object detectors, effectively transforming any detector into a visual prompt-based detector.Furthermore, we revisit current visual prompt detection benchmarks and correct their unrealistic test assumptions and class splits, which limit the usability of the developed algorithms in the real world.We introduce a set of realistic benchmarks to remedy these issues.We comprehensively evaluate the proposed model on both existing and our new benchmarks, outperforming current state-of-the-art one-shot and few-shot detection methods by a large margin.Our code and benchmark will be released.