LightGazeNet: A Lightweight GNN-based Architecture for Gaze Estimation
Abstract
Gaze estimation remains a fundamental yet challenging task, requiring a careful balance between accuracy and efficiency for real-world deployment. We introduce LightGazeNet, a lightweight Graph Neural Network (GNN) framework designed for appearance-based gaze estimation. LightGazeNet effectively integrates multi-modal inputs—including facial features, eye cues, 3D eye centers, head pose, and calibration data—within a compact graph-based architecture. To enhance feature fusion across heterogeneous inputs, the framework leverages multi-head attention to model complex spatial dependencies. Extensive evaluations on multiple benchmark datasets show that LightGazeNet achieves competitive or superior accuracy with significantly fewer parameters than existing methods. Furthermore, it demonstrates strong cross-dataset generalization, with calibration-based adaptation improving robustness under domain shift. By combining accuracy, efficiency, and adaptability, LightGazeNet offers a practical solution for gaze estimation in real-world settings while advancing graph-based modeling in computer vision.