PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification
Abstract
Whole Slide Images (WSIs) are high-resolution digital scans widely used in medical diagnostics. Due to their immense size, WSI classification is typically approached using Multiple Instance Learning (MIL), where a slide is partitioned into individual tiles, disrupting its spatial structure.Recent MIL methods often incorporate spatial context through rigid spatial assumptions (e.g. fixed kernels), which limit their ability to capture the intricate tissue structures crucial for an accurate diagnosis.To address this limitation, we propose Probabilistic Spatial Attention MIL (PSA-MIL), anovel attention-based MIL framework that integrates spatial context into the attention mechanism through learnable distance-decayed priors, formulated within a probabilistic interpretation of self-attention as a posterior distribution. This formulation enables a dynamic inference of spatial relationships during training, eliminating the need for predefined assumptions often imposed by previous approaches. Additionally, we introduce a diversity loss that encourages spatial variations among attention heads, ensuring each head captures distinct representations.Furthermore, we address the computational challenge that long sequences, such as those in WSI analysis, pose for transformer-based architectures by introducing a spatial pruning strategy for the posterior, thereby reducing computational costs while maintaining performance.Together, PSA-MIL enables a more data-driven and adaptive integration of spatial context, moving beyond predefined constraints.Extensive experiments on multiple datasets and tasks demonstrate that our method outperforms both contextual and non-contextual models, setting a new state-of-the-art while significantly reducing computational costs.