LASOR: Towards Clinically Transparent and Explainable Ophthalmic Report Generation via Lesion-Aware Segmentation
Jian Park · Hyunseon Won · JeeEun Kim · JOON HWANG · Jeong Han · Ji Park · Daniel Hwang · Jinyoung Han
Abstract
Automated ophthalmic report generation aims to reduce the diagnostic burden on retinal specialists by producing clinically accurate and standardized descriptions from medical imaging. However, current research predominantly remains fundus-centric and rarely exploits OCT-derived spatial evidence, limiting clinical transparency by obscuring which anatomical regions drive diagnostic decisions. To address these limitations, we propose $\textbf{LASOR}$ ($\textbf{L}$esion-$\textbf{A}$ware $\textbf{S}$egmentation-Guided $\textbf{O}$phthalmic $\textbf{R}$eport Generation), which extracts multi-scale features to robustly capture both small focal abnormalities and broader anatomical structures, generating reliable segmentation masks as spatial priors for report generation. Specifically, we utilize a lesion-aware patch weighting module to emphasize abnormal regions and leverage a curated instruction dataset incorporating spatial mask information to enhance the diagnostic capabilities of the proposed model. In addition, we introduce a mask-guided cross-modal consistency loss that strengthens vision–language alignment between pathological regions and their diagnostic descriptions. Extensive experiments on a retinal OCT dataset that includes twenty pathological conditions exhibit state-of-the-art performance, underscoring LASOR's potential to advance clinically transparent ophthalmic report generation systems.
Successful Page Load