Ordinal-Aware Multimodal Engagement Recognition for Collaborative Learning
Abstract
Assessing student engagement is critical for collaborative learning but remains a challenging task. Existing approaches often rely on controlled laboratory or online settings, which fail to capture the complexity of real-world classrooms. Furthermore, current datasets are scarce and rarely provide both individual- and group-level annotations, limiting the development of robust and generalizable models. To address these gaps, we propose CORE-Net, a multimodal architecture that integrates context modeling to capture group-level dynamics and ordinal supervision to account for the ordinal nature of engagement levels. We also present COLER, a large-scale dataset collected in authentic classroom environments with rich annotations at multiple levels. Experiments demonstrate that CORE- Net achieves 89.63% accuracy and 94.80 QWK, signifi- cantly outperforming strong baselines such as BlockGCN and MoViNet. Ablation studies further highlight the critical role of both context modeling and ordinal supervision. Our work establishes a robust and scalable foundation for automated engagement assessment, supporting timely feedback and enhancing the effectiveness of collaborative learning.