Improving Out-of-Distribution Detection Using Segmented Images and Cross-View Attention Fusion
Abstract
Although out-of-distribution (OOD) detection has been extensively studied, it continues to face challenges in handling OOD data semantically similar to In-Distribution (ID) data. Part of the difficulty arises from the model's inability to learn superior ID class discriminative features. We propose to improve this by segmenting input images into foreground and background views and combining them with the original input image (original view) in a multi-view learning approach. We present a novel method, called CASOD (Cross-view Attention of Segmented views for OOD Detection), that learns better discriminative information from all three views and subsequently, fuses them through a novel stacked cross-view attention mechanism to produce the final predictive feature representation. A feature-based method is then applied to the final fused feature for OOD detection, giving major improvements over a range of strong baselines on various near- and far-OOD datasets. CASOD achieves state-of-the-art performance in various experimental settings with challenging ID and OOD datasets. The CASOD codebase is submitted in the supplementary materials.