Hymavi : A Hybrid Mamba-Attention Network in Multi-View Framework for Volumetric Medical Image Segmentation
Abstract
Volumetric medical image segmentation remains a challenging and critical task in both clinical and research settings due to the inherent complexity of anatomical structures, modality-specific variability, and the need to capture both fine-grained local details and long-range spatial dependencies across 3D volumes. To address these challenges, we propose Hymavi, a novel hybrid architecture that combines Mamba-based sequence modeling with attention mechanisms in a parallel design. This dual-branch structure enables Hymavi to simultaneously leverage the high-resolution spatial reasoning capabilities of attention and the efficient global context modeling afforded by Mamba's recurrent-style architecture. In addition to its architectural innovations, Hymavi incorporates a multi-view learning strategy that leverages sagittal and coronal perspectives alongside the conventional axial view. This multi-view fusion enriches volumetric representation by integrating complementary anatomical information from different orientations, allowing the network to better capture inter-slice continuity and organ-specific variations. Extensive experiments on three widely used benchmark datasets including ACDC, BraTS2023, and AMOS22 demonstrate the effectiveness and strong generalization ability of our method across diverse segmentation tasks and imaging modalities. These results underscore the potential of Hymavi as a powerful tool for advancing automated medical image analysis. The code is publicly available at https://anonymous.4open.science/r/Hymavi_segmentation-C78E.