HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis
Abstract
Multimodal fusion frameworks, which integrate diverse medical imaging modalities (e.g., MRI, CT), have shown great potential in applications such as skin cancer detection, dementia diagnosis, and brain tumor prediction. However, existing multimodal fusion methods face significant challenges. First, they often rely on computationally expensive models, limiting their applicability in low-resource environments. Second, they often employ cascaded attention modules, which potentially increase risk of information loss during inter-module transitions and hinder their capacity to effectively capture robust shared representations across modalities. This restricts their generalization in multi-disease analysis tasks.To address these limitations, we propose a Hybrid ParallelāFusion Cascaded Attention Network (HyPCA-Net), composed of two core blocks: (a) an efficient residual adaptive learning attention block for capturing refined modality-specific representations, and (b) a dual-view cascaded attention block aimed at learning robust shared representations across diverse modalities. Extensive experiments on ten publicly available datasets exhibit that HyPCA-Net significantly outperforms existing methods, achieving performance improvements of up to 9.34\%, while reducing computational costs by up to 78.3\%.