FLoMo-Net: A Novel Task-Adaptive Mixture of Experts Routing Framework with Frequency and Uncertainty Correction for Medical Image Segmentation
Abstract
Medical image segmentation (MIS) is challenged by anatomical variability, ambiguous boundaries, and subtle textures, demanding an efficient balance between fine local details and global context. Existing architectures often suffer from suboptimal fusion of spatial and frequency-domain features, limiting their ability to capture richer structural and textural representations. To specifically address these challenges, we introduce FLoMo-Net, a MIS architecture designed to simultaneously achieve superior performance and efficiency by adaptively routing multi-scale local and global context information. The proposed Local-Global Mixture of Experts encoder dynamically integrates specialized convolutional branches to selectively capture relevant spatial scales, while the Dual-Attention Selective Aggregator module further refines deep encoder and decoder features by integrating frequency-guided channel attention and adaptive spatial attention. Additionally, the Frequency-Aware Multi-Scale Refinement module enhances structural precision by explicitly modeling frequency-domain features at the bridge of the architecture. Our False Positive/Negative Corrective Attention Module leverages uncertainty measures, derived from entropy and cosine dissimilarity, to specifically address semantic drift and improve boundary delineation in the decoder stages. Extensive experiments across four benchmark MIS datasets demonstrate that FLoMo-NetB2 achieves superior performance with significantly fewer parameters, outperforming state-of-the-art counterparts in both dice score and inference speed. Moreover, our architecture scales effectively, with FLoMo-NetB0 (2.06M) and FLoMo-NetB1 (7.99M) delivering competitive results, underscoring the practical viability of our design for real-time clinical applications.